An RNN-based prosodic information synthesizer for Mandarin text-to-speech
نویسندگان
چکیده
A new RNN-based prosodic information synthesizer for Mandarin Chinese text-to-speech (TTS) is proposed in this paper. Its four-layer recurrent neural network (RNN) generates prosodic information such as syllable pitch contours, syllable energy levels, syllable initial and final durations, as well as intersyllable pause durations. The input layer and first hidden layer operate with a word-synchronized clock to represent currentword phonologic states within the prosodic structure of text to be synthesized. The second hidden layer and output layer operate on a syllable-synchronized clock and use outputs from the preceding layers, along with additional syllable-level inputs fed directly to the second hidden layer, to generate desired prosodic parameters. The RNN was trained on a large set of actual utterances accompanied by associated texts, and can automatically learn many human-prosody phonologic rules, including the wellknown Sandhi Tone 3 F0-change rule. Experimental results show that all synthesized prosodic parameter sequences matched quite well with their original counterparts, and a pitch-synchronousoverlap-add-based (PSOLA-based) Mandarin TTS system was also used for testing of our approach. While subjective tests are difficult to perform and remain to be done in the future, we have carried out informal listening tests by a significant number of native Chinese speakers and the results confirmed that all synthesized speech sounded quite natural.
منابع مشابه
An RNN-based spectral information generation for Mandarin text-to-speech
In this paper, an RNN-based spectral model is proposed to generate spectral parameters for Mandarin textto-speech(TTS). The RNN is employed to learn the relations between the linguistic features and the spectral parameters. The phoneme-to-spectral parameter rules and the coarticulation rules between each two adjacent phones are automatically learned and memorized into the weights of RNN. The sy...
متن کاملRNN-based prosodic modeling for mandarin speech and its application to speech-to-text conversion
In this paper, a recurrent neural network (RNN) based prosodic modeling method for Mandarin speech-to-text conversion is proposed. The prosodic modeling is performed in the post-processing stage of acoustic decoding and aims at detecting word-boundary cues to assist in linguistic decoding. It employs a simple three-layer RNN to learn the relationship between input prosodic features, extracted f...
متن کاملA Corpus-Based Prosodic Modeling Method for Mandarin and Min-Nan Text-to-Speech Conversions
This talk gives an introduction to a recurrent neural network (RNN) based prosody synthesis method for both Mandarin and Min-Nan text-tospeech (TTS) conversions. The method uses a fourlayer RNN to model the dependency of output prosodic information and input linguistic information. Main advantages of the method are the capability of learning many human’s prosody pronunciation rules automaticall...
متن کاملHigh-Quality Prosody Generation in Mandarin Text-to-Speech System
A text-to-speech (TTS) synthesizer is a computer-based system that can automatically read text aloud. Fujitsu is developing a Mandarin TTS system using state-of-the-art technologies. The prosodic structure of synthesized text provides important information for making synthetic speech produced by a TTS system more natural and understandable. This paper describes a global probability estimation m...
متن کاملAn NN-based Approach to Prosodic for Synthesizing English Words Em
In this paper, a neural network-based approach to generating proper prosodic information for spelling/reading English words embedded in background Chinese texts is discussed. It expands an existing RNN-based prosodic information generator for Mandarin TTS to an RNN-MLP scheme for Mandarin-English mixed-lingual TTS. It first treats each English word as a Chinese word and uses the RNN, trained fo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Speech and Audio Processing
دوره 6 شماره
صفحات -
تاریخ انتشار 1998